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Along an individual lifetime, stem cells replicate and suffer modifications in their DNA content. 

1 model the modifications in the DNA of a single cell as a Levy flight, made up of small amplitude 
Brownian motions plus rare large-jumps events. The distribution function of mutations has a long 
tail, in which cancer events are located far away. The probability of cancer in a given tissue is 
roughly estimated as aN ce iiN st ep , where N ce u is the number of stem cells, and N ste p ~ the number 
of replication steps in the evolution of a single cell. I test this expression against recent data on 
lifetime cancer risk, N ce u and N ste p in different tissues. The coefficient a takes values between 

2 x 10“ 15 and 2 X 10 11 , depending on the role played by carcinogenic factors and the immune 
response. The smallest values of a correspond to cancers in which randomness plays the major role. 

PACS numbers: 87.19.xj, 05.40.Fb, 87.23.Kg 


Spontaneous vs induced mutations. A common 
knowledge states that both the normal activity of stem 
cells in a healthy individual, and external agents such 
as ionizing or ultraviolet radiation, toxic substances (de¬ 
rived from smoking, for example), etc cause mutations 
H|. Mutations related to the normal function of the cell 
are called spontaneous. They are thought to have a ran¬ 
dom origin, External agents, on their side, are visualized 
as causes of induced mutations. 

The debate about spontaneous (random) and induced 
mutations and their role in carcinogenesis rose recently 
[MU with the article [2], in which data on lifetime risk 
of cancer in different tissues, along with the number of 
stem cells, N ce u , and the number of replication steps, 
Nstep , are compiled. 

The purpose of my paper is to present a model for 
mutations in stem cells and the genesis of cancer. Em¬ 
phasis is put on the qualitative aspects. Detailed numer¬ 
ical simulations for different tissues are to be published 
elsewhere. 

The accumulative character of mutations. Au¬ 
thors of Ref. [2] postulate that the probability of a given 
mutation (they are interested in cancer) should depend 
on the overall number of replication steps in the tissue. 
This assumption neglects the history in the evolution of 
each cell. 

In my model, on the contrary, the time evolution of 
cells defines trajectories, as schematically represented in 
Fig.0 where one of these trajectories is drawn as a red 
dashed line. 

The idea about trajectories in the evolution of cells 
means that there are Markov chains El) of mutations, 
where the change in the DNA of a cell at step i + 1, 
Xi+ 1 , comes from the change in the previous step plus an 
additional modification: 


x i+ i =Xi + 6 (1) 

Measuring changes in the DNA A single strand 
of human DNA contains around 3 x 10 9 bases of a four 
letter alphabet: G, A, T, and C. E) In order to measure 



FIG. 1. (Color online) Schematic representation of the evolu¬ 
tion of stem cells in a tissue. First, the cells divide until their 
number reaches N ce u. There are Ng tep « log 2 N ce u steps in 
this clonal expansion phase. Further on, the number of cells is 
kept roughly constant. This means that the excess stem cells 
resulting from divisions go to replace damaged cells in the tis¬ 
sue or to a programmed apoptosis. If there are N^ tep steps in 
this homeostasis phase, then the total number of replication 
steps along a trajectory is N ste p — Ng tep + Ng tep . 


changes in the DNA, one may use a variable similar to 
that one of paper [T2]). 

First, define an auxiliary variable at site a in the 
molecule: u a (G ) =3/8, u a (A ) = 1/8, u a (T ) = —1/8, 
and u a (C) = —3/8. Then, define a walk along the DNA: 
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y(P) = J2 Ua - ( 2 ) 

a=l 

As a function of p , the variable y draws a profile of 
the DNA molecule, and modifications can be measured 
as: X(/3 ) = y(/3) — yo(/3). where y correspond to the 
mutated DNA, and yo - to the initial configuration. Of 
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course, there are so many X(f3), three billions, that they 
are not of practical use. The strategy could be to restrict 
the analysis to certain coding regions in the DNA, for ex¬ 
ample, and for these regions, to use variables measuring 
global changes or distances to the original function: 

L 

* = ( 3 ) 

a= 1 

L 

= ( 4 ) 

a=l 

X (the second moment), etc. L is the length of the 
coding region. The Shannon informational entropy m 
could also be of use. 

In what follows, I shall assume that mutations in a 
given coding region are well characterized by a few global 
variables. 

Other heritable gene variations, not involving changes 
in the DNA sequence [14] . could, in principle, be incor¬ 
porated, although presently I do not have a proposal for 
a variable measuring them. 

Modeling mutations The 5 term in Eq. 0 rep¬ 
resents mutations at step i + 1. It may come from a 
partially repaired damage in the DNA that is fixed after 
replication, or from an undesired error in the replication 
process. It should be stressed that both the repair mech¬ 
anisms and the replication process guarantee very high 
fidelities. The error introduced by the latter, for exam¬ 
ple, is around one mistaken base per 10 9 bases in the 
DNA strand [I]). 
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FIG. 2. (Color online) Schematic representation of a single 
cell mutation trajectory. The starting point is X = 0. In the 
mutation space, I distinguished regions in which the DNA 
repair mechanism is active or damaged. The latter is one of 
the hallmarks of cancer, known as genetic instability H5]. 

Let me stress, once again, that 5 is not the damage 
caused by endogenous or external factors, but the re¬ 
sulting modification after the action of the repair mech¬ 
anisms, and fixation. It is known, for example, that ion¬ 
izing radiation may cause double strand breaks in the 


DNA [16]. These damages are very difficult to repair 
PQ). The repair mechanism itself may introduce large 
changes in the resulting DNA composition after a double 
strand break event. 

My proposal for 5 is the following: 5 = 5 b + 5b j- The 
5b component corresponds to a Brownian motion with 
maximal amplitude Db • Notice that Db = 1 would mean 
roughly a change of basis in each replication step because 
u a (G ) — u a (C ) = 3/4. This Brownian motion introduces 
local modifications in the DNA. After N step replication 
steps, the characteristic dispersion of a trajectory due to 
this Brownian motion (something like the radius of the 
colored region near the origin in Fig. 2) is Db yjX step . 

im 

The large-jump component of 5 , 5b j, on the other 
hand, is modeled with the help of rare events with total 
probability p « 1, and a probability density propor¬ 
tional to 1 j5\j, where the amplitude ranges from Db to 
infinity (in practice, I will introduce a cutoff, D max ). The 
combination of the Brownian motion and the large am¬ 
plitude jumps leads to Levy flights DSD in the mutation 
space, schematically represented in Fig. [2] 

Let me notice that the distribution function associated 
to Levy flights is a fat- or long-tail one. This fact could 
be related to the long range correlations observed in the 
walks along the DNA jT2j . 

The long-tail distribution function of muta¬ 
tions. Four parameters enter my oversimplified Levy 
model of mutations in stem cells: N ce u , N step , Db and 
p. In order to fix ideas, I show a calculation with pa¬ 
rameters that, although arbitrary, approximately fit the 
data on lifetime risk of cancer in the gallbladder tissue. 
The result, however, will be not only the probability of 
cancer, but the distribution function of mutations of any 
amplitude. 

I took N^ tep = 20, which corresponds to N ce u & 10 6 . 
On the other hand, N^ t = 47, as in Ref. [2], thus 
Xstep = 67. 

I restrict the analysis to a coding region in the DNA of 
length L = 10 6 . Around 10 -3 basis are changed in each 
replication step, thus I take Db = 10 -3 . Cancer events 
are assumed to be at a distance X cancer » Db yjN step 
from the origin. It is fixed to X cancer = 1000. The 
probability p = 3.8 x 10 -5 was chosen in order to fit the 
lifetime risk of gallbladder cancer mm- 

Simulations start from a single cell. After N^ tep di¬ 
vision steps, the N ce u trajectories are generated. These 
trajectories proceed N^ t steps further. In any replica¬ 
tion step, either of the expansion or homeostasis phase, 
mutations are given by Eq. ([l]) , where 5 contains both 
the Brownian and the large-amplitude components. 

The probability distribution function for mutations in 
a cell, P(X), is the probability that a cell arrives at the 
end point with an amplitude X. I compute not P(X), 
but the cumulative probability distribution, P( \X\ > Z), 
which is shown in Fig. [3] 

The Brownian radius, yjN step Db ~ 8 x 10 -3 , concen¬ 
trating most of the points, is apparent in the figure. In 
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FIG. 3. (Color online) The average cumulative probabil¬ 
ity of mutations, P (|X| > Z), in a coding segment of the 
DNA molecule. Points come from the numerical simulations, 
whereas the red solid line is a 1/Z fit to the tail. The Brownian 
radius, Db \^Nstep, is marked by a dashed line. Parameters 
are chosen in order that the slope in the tail reproduces the 
lifetime risk for cancer in the gallbladder tissue. 


addition, the tail can be fitted to a 1/Z dependence. The 
coefficient is roughly N step DB P- 

Notice that the probability of cancer in the tissue can 
be estimated as N ce u P(\X\ > X cancer ). As X cancer is in 
the tail of the distribution, we may use the asymptotic 
formula: 

visk ~ N ce uN step D b p/X cancer aN ce uN step . (5) 

For the gallbladder example, a ~ 2.5 x 10 -11 . Be¬ 
cause a ~ Db p/X cancer , there is an arbitrariness in the 
selection of the parameters Db, p and X cancer . How¬ 
ever, it should be possible to select or pick up a unique 
meaningful set consistent with this numerical value. The 
asymptotic formula, Eq. does not depend on the 

choice of parameters. 

Analysis of the data on cancer in different tis¬ 
sues. I use Eq. ([5| in order to re-examine the data 
presented in paper [2]). The qualitative idea is the fol¬ 
lowing: the lowest values of a correspond to tissues in 
which mutations are close to the naturally occurring ones 
in a healthy individual. On the contrary, high values of 
a indicate the presence of strong abnormal conditions or 
external factors in the genesis of cancer. In Fig. [4j I 
show the results for the lifetime risk of cancer per stem 
cell versus N step . This magnitude can be directly related 
to the formulae of the previous sections. In order to fa¬ 
cilitate the analysis, the studied cancers are divided in 
groups. 

Group I includes 11 points in the figure, located in a 
band delimited by coefficients 2 x 10 -14 < a < 10 -13 . 
In the lack of a better name, I call this set the normal 
group. I use a solid red line in order to distinguish the 
bottom edge, and a dashed red line for the top one. In 


this set, randomness seems to play an important role in 
the genesis of cancer, as originally claimed in Ref. [2j. 
The fact that this group is composed by very different 
tissues - from the medulloblastoma to the colorectal ade¬ 
nocarcinoma - is, perhaps, the confirmation that, under 
unperturbed conditions, tissues in the body evolve in a 
very similar way. The starting point in the Levy model, 
Db, p and X cancer should take very similar values for all 
of them. 

For the coefficient a, we may write the expression: 

a = ERS x 2 x 1CT 14 . (6) 

The in-front factor, ERS , a kind of measure of the ef¬ 
fects of lifestyle or external carcinogens, takes values be¬ 
tween 1 and 5 (see Table^ , meaning that, in principle, by 
means of proper correctives, the risk for cancer could be 
reduced, for example, around twice in the colorectal ade¬ 
nocarcinoma, four times in the basal cell carcinoma, or 
five times in the lung adenocarcinoma in the non-smoker 
sub-population. 

Group II, with five points in the figure, include cases in 
which genetic or viral causes are predominant. Genetic 
predisposition means that mutations start at a point 
closer than usual to the cancer region. Thus, the distance 
Xcancer is much shorter and the probability dramatically 
increases. The ERS index exhibits very high values in 
this set. 

The abnormal values of ERS for the four cases con¬ 
tained in Group III have, in my opinion, an immuno¬ 
logical origin. Indeed, germinal cells and the brain are 
partially isolated from the immune agents. Our body 
uses barriers in order to protect these tissues against in¬ 
fections, but the barriers can not protect against tumors, 
which come from inside. From the point of view of can¬ 
cer, they are immunodepressed tissues. 

Thus, I may say that a prot ~ 2 x 10“ 14 is a reference 
value for a tissue protected by a normal immune system, 
whereas a^r " 2 x 10“ 12 (100 times higher) refers to 
immunodepression conditions. 

On the other hand, the extremely low value of a for the 
small intestine adenocarcinoma (eight times lower than 
normal) can not have other explanation than overpro¬ 
tection by the immune system. One may speculate that 
the small intestine is a possible entrance door for the mi¬ 
crobiota living in the colon, and as such it requires spe¬ 
cial protection. Paneth cells, Peyer’s patches, and other 
structures concentrated in the distal ileum, are perhaps 
the responsibles for this reinforced protection. This fact 
should be further studied. If confirmed, one can even 
imagine therapies against cancer or other illness exploit¬ 
ing this extra capacity of the small intestine. 

Finally, there is a group of 11 cancers (5 tissues) ex¬ 
hibiting abnormally high values of the ERS index, pre¬ 
sumably related to strong external factors. One exam¬ 
ple is lung adenocarcinoma, for which the concurrence 
of radioactive Radon and smoking produces a 90-fold in¬ 
crease of the slope. The extreme case in this group is 
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Cancer type 

ERS 

Group I. Normal 


Hepatocellular C 

1.13 

Melanoma 

1.16 

Pancreatic endocrine C 

1.23 

Pancreatic ductal AC 

1.45 

Medulloblastoma 

1.49 

Myeloid leukemia 

1.54 

Duodenal AC 

1.93 

Lymphocytic leukemia 

1.95 

Colorectal AC 

2.04 

Basal Cell C 

4.02 

Lung AC (non-smokers) 

5.15 

Group II. Viral and Genetic 


Hepatocellular C with HCV 

11.29 

Colorectal AC with Lynch 

21.30 

Head and Neck SCC with HPV 

122.96 

Colorectal AC with FAP 

204.51 

Duodenal AC with FAP 

225.29 

Group III. Immune 


Small intestinal AC 

0.12 

Glioblastoma 

14.48 

Testicular germinal cell 

52.78 

Ovarian germinal cell 

79.86 

Group IV. Abnormal 


Head and Neck SCC 

21.38 

Osteosarcoma (Head) 

70.02 

Esophageal SCC 

79.44 

Thyroid medullary C 

84.22 

Lung AC (smokers) 

92.77 

Osteosarcoma (Arms) 

124.72 

Osteosarcoma (Pelvis) 

138.08 

Osteosarcomas 

153.04 

Thyroid papillary and follicular C 

239.78 

Osteosarcoma (Legs) 

266.49 

Gallbladder non papillary AC 

1299.58 


TABLE I. The Extra Risk Score (ERS) index of Eq. ^ for 
cancer in different tissues. 


Concluding remarks. In my model, stem cells draw 
Levy flights in the mutation space. The small ampli¬ 
tude Brownian component is characterized by a radius 
DB\/N s t ep , whereas the rare large jumps give rise to a 
long tail ^ 1/Z in the cumulative probability distribu¬ 
tion. Cancer events are located in the tail. Their rate 
can be estimated from Eq. (J5|, where a ~ Db p/X cancer . 
Variations in a are mainly related to variations in p, the 
probability of nonlocal changes in the DNA. Trajectories 
in the mutation space are always random, external car¬ 
cinogens basically increase the probability p. The ERS 
index defined in Eq. © has, thus, a clear meaning as a 
reference to a normal tissue, unperturbed by external fac¬ 
tors. Re-examination of the data reported in Ref. [2] re¬ 


gallbladder non-papillary adenocarcinoma, with an in¬ 
dex ERS = 1300, the understanding of which is a real 
challenge. 


Mutations in bacteria. With appropriate parame¬ 
ters, my Levy model can be applied as well to mutations 
in bacteria. I recall the extremely interesting Long Time 
Evolution Experiment with E. Coli, conduced by Prof. 
R. Lenski and his group m- Besides many other re¬ 
sults, they report frequencies at which a mutation with 
damages in the DNA repair mechanisms becomes domi¬ 
nant in a population [2L. This mutator phenotype has 
in common with cancer, besides DNA instability, that it 
should be far away in the tail of the probability distri¬ 
bution. Thus, I may use Eq. © for the probability of 
occurrence, and determine the coefficient a. 


The number of cultures they use is small, 12. Thus, 
I expect statistical errors of the order of 1 /a/ 12 ~ 0.3 
for the probability. Nevertheless, they report that the 
mutator phenotype becomes dominant in two cultures 
(cumulative probability 1/6) when N step « 2500 — 3000, 
in a third culture (cumulative probability 1/4) when 
Nstep ~ 8500, and in a fourth culture (cumulative proba¬ 
bility 1/3) when N step « 15000. From this data and the 
number of evolving trajectories, N ce u « 5 x 10 6 , I obtain 
ctbact ~ 5 x 10 -12 . It is remarkable, that abact is of the 
same order of magnitude of a^ epr . Details can be found 
in Ref. [22]. 


veals groups of cancers, which range from normal tissues 
(randomness dominated cancers) to abnormal tissues, in 
which external carcinogens play the major role. Particu¬ 
larly interesting is the small intestine, which seems to be 
overprotected by the immune system. My model stresses 
the role of mutations in the genesis of cancer. It is rea¬ 
sonable to expect, however, a formula like Eq. © to be 
valid even when epigenetic or microenvironment factors 
are taken into account [23] . 
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